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t 
ae 
- A 
3 
Poe 
. iz 
z 
y 
4 
at 
fo 
4 
fr, 
4 tite 
Es 
; v4 
4 
8 
se 
+ 
ee 
& av, 
a 
aay 
Po 
es. 
1 F 
ee 
* 
Bod 
9 
ne 
aa 
ce 4 
i 
Poe 
Bil 
aes 
Fi 
Md 
& 
By 
ra 
F 3 
ie 
# 
= 
me. 
+e 
1. 
Bie 
pe 
oe 
% 
a 
Eg 
as 
a es 
“al 
Eis 
= 


Institut des Hautes Etudes Scientifiques 
35. route de Chartres 
91440 - Bures-sur- Yvette (France) 


Mai 1993 


ra 


IHES/M/93/21 ee: 


% 


[DTIC QUALITY INSP2GTED g eg | 


er, ee ee ee ar. ate a et a nae 
Pe ay aretha Be SS aE: a 


lea Bi Bil al &: zs 


PB93-210979 


i 


QUANTIPATIVE CHARACTERISTIOS OF PRIAARY AMINO ACED 
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Arnold J. Mandell and Waren A] Sel? 4 
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The quantitative prediction of the tertiary structure of proteins as defined by their x-ray crystallographic 
coordinates. using statistical physical and/or symbolic characteristics of the primary amino acid sequence is a 
long standing problemi in biopolymer physics. Anerstwhile missing feature of protein structural data has heen 
a measure for the mapping of a set of r- ray observables onto a single real number asa continuously distributed 
descriptor which could then serve as the object of quantitative prediction. Such a global measure using the 
r-ray coordinates of protein crystals has been developed by Stapleton aud associates.'-3 Computation of the 
range of inter—a-carbon distances indicated that there were protein specific, statistically reliable, fractional 
power laws (we call them “Stapleton protein fractal dimensions” .dy) relating the amino acid monomeric 
mass density to the a-carbon distances with values ranging from 1.26 to 1.87. Although insufficient in orders 
of magnitude of length to qualify for the definition of fractals. intuitively, the values appear related to the 
space filling aspects of tertiary structure. For examples, the “curled up.” barrel dominated proteins such as 
equine hemoglobin A and B and sperm whale myoglobin manifested ads = 1.65, whereas in the “stretched 
: out™ more “random” chains such as protease A and 7 from s. griseus, ds = 1.31 and 1.32 respectively. 
From similar computations yielding the “fractal dimension” of a polymer represented by the orbit of a self 
avoiding random walk in three dimensions, it is speenlated that the upper bound on dg is in the vicinity of 


S735 


Chothia’s studies relating a protein's conformation deduced from s-ray crystallographic data* to its amino 
acid side chain hydrophobicity values. Ap, (in cal WN 7h imol”'), derived from the studies of Nozaki and Tanford5, 
suggested a tneasure map to the reals for the amino acid sequence as quantitative predictors. Each proteim 
can be transtormed into a hydrophobic sequence. MAp,. from which a statistical model to predict the proteims’ 
values for ds could be developed. A representative set of amino acid-hydrophobic transformaticns® in cal 
Ko nob yield: q = 0.00.8 = 0.07.0 = 0.07.0 = 009.9 = 0.10.d = 0.66.¢ = 0.67.r = 0.85, = 087A = 
OST.c= 132k = b6tom = L67e = ESTs 20g = 2.06 ps 2070 f = 287s 3, and w = 3.70. 


That relationships between measures on MAp, and dy have physical meaning is suggested by two groups of 
research findings: (1) Phe related studies by the Stapleton group yielding solvent tonic strength: sensitive, 
densities of low frequency (< 300en7 !). vibrational state fractional power laws, when probed by temperature 
dependent, Raman spin-lattice relaxation techniques in heme and iron-sulfer proteins, which were very similar 
to the proteins’ values for dy;'79 (2) Calorometric studies of the specific heats of proteins consistent with 
the presence of internal “soft” modes with low fundamental frequencies (< 00cm"), easity excitable and mt 

subject to the influence of hydrophobic factors in folding. ligand binding. and tome environment.‘~° : 
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One might relate these two ideas to the space-filling dependence of ds witht the idea that protein relaxation i \ 
dynamics might vary in temporal complexity when comparing, the potential motions along the spatially 2 | 
one-dimensional amino acid peptide backbone with the more complex, hierarchical multimodal, bigher di- ‘i 
mensional case involving hydrophobic interactions of the amino acid side chains (off backbone connectivity) & 

: in addition to one dimensional pathways. The distributions of modes, p(7). between 7 and 7 + 67, would go 8 
like 4745 1 < de < 2. These considerations motivated the development of quantities on TApywhich might 4 
d.-scribe the potential for hydrophobic “mode” structure predictive of Stapleton's measure, ds. : 

° In 35 proteins representing the range of reported values for ds. we studied the relationships between ds ? 
obtained in two series of studies by the Stapleton group!and four statistical transformations on TAp;. i 
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its transformiatt msoaneluded Ch) As a statistical modulus. the average dvydrephobicsty per amino acid 
testdue. Dp peat cal Wo mol TY C2) AS a statistical wave leneth the average inter-shigh hydropheobre ran 
botervaban nitiber of aname acid restdies, saa. db Which the values for Ap for the amano acid: sequences 
Were poattitioned dite OQ for Ap ac feucine (ee 27) and Po for vabaes os leueme: (3) As an estimate of 
loteer pange order ap the hydropliolie sequenee partitioned tte four bans letters) of five amino acids each 
(. TO 68- STP SDR 2 2 TH BT the longest “word A Ap, in nonmber of amino acid residues 
fa word ws defined as a sequence of anne acid residue transfortiations Chat appears at deast twice along the 
length of the protemy) Whereas og. vields values sensitive to stall structure (for example, the henoglobins 
aud aivoglobin with hiel densities of a-heltnes maunfest the expected valtes 2 305 and at helix-like value of 
HD Was Seon in CATON. pep trase cts “average turn feugths 3 i, Mceits }*: ied upto 1 restdues, hp is similar to 
the rotation mtiber tared in studies of two dimenstoncab redaetions of tliree dine TST: abdynamneal sastens!! 
ed Vp is derived? from ssuibolie dy nanos and lexical conipresston algortthiaas. OUP) As a correction teri, 
Weotised the percent of the sequence length that was proline “CCPRO), ats role as a “structure breaker” in 


Pative pr tet comfortuations beam well Known 


We renin oursefves of the Prafos-Rensi “new baw of large nutubers® se sass that the longest expected 


repetition length aaa random sequence is asymptotically = low. py? Nap exceeded this value for all 
proteris studied Por examples, for the four letter eode (po (25 an the aos EU restdne bemogtobin. a 
longest word length. AQ py of 38 96 was expected and two ere Mes ee of GO restdues were observed: the 


expectation for protease " was 3 TO and an di residue word was observed: for elastase it was 3.95 versus 13. 
i 


Phe proteins studied were: protease A and BUS griseus), invoglobin (spar thal). rhodanese (borme), 
staphislococeal nuclease @S. anretws), glyeeratdehyde dehydrogenase (lobster), thermotysin (2 amylolique- 
fuctens), thioredoxin (2 ced. adenylate kinase (porcine), aleohol dehydrogenase (equine). algal ferredoxin 
(No platenses). carbonte anhydrase Bo and C human), carboxypeptidase A and [Cbocme), concanavalin A 
(yack bean), evtochromes. C(Calbacere), Bipberme) C2OR. rabramy C55STCP. acragmeosa), BOB2CE. colt). di- 
hveltofolate reductase (2 case). elastase (porcine). flavodoni (elosteeadcam), hemervthrin BOP. gouldi), 
hemoglobin A and B(eqame). lactate dehydrogenase (dogfish). Iwsozvine (chicken). subtilisin inhibitor (8. 
alborgriseoulus), superoxide distiutase Chorane), tevpsin inhibitor (bec), chymotrypsin a(borine), papain 
(papaya), and subtabisin (Bam yledequefacsens). 


Preating the comtumous, transformed measures as predictors showed negligible linear intercorrelations, with 
thre: eXceptian ofa stro regative relationship between between hb anil w(t = —O.S05)(the more dense the 
> 210. bydrophobie bursts, the higher the average livdrophobicity). Since these measures were redundant 
with Lfespeet to dy. two alternative regression models predicting ds were constructed incorporating way In one 
and Aba the ce ‘ro Using standardized FeRresston | coetficients (ie. 3's). ds-predictive Model Lis [—.2160n49— 
AON ep PEEP ROD] and sl - is [+ 22TAb— AO3A,. ond 7 P205(PRO)]. Model T resulted in a squared 
multiple Ro OBO (adjusted: 22° 274) and a highly significant ANOVA [F(3.31) = 5.661, p = 0.003). 
Similarly, for Model H.R? = ee R? = 0.291) and an ANOVA of [F(3. 31) = 5.332, p = 0.004]. 
These findings are consistent with our hypothesis that the values for dscomputed upon z-day crystallographic 
data from protein tertiary structure can be predicted from: suitable transformations of the primary amino 
acid hydrophobicity sequences of the protems. That Ayo. bas a strong negative weighting with respect. to 
dysugeests that the simple “fractal” interpretation! Sof dvis insufficient. 
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